Synthesizing E cient Out - of - Core Programs for BlockRecursive Algorithms using Block - Cyclic Data DistributionsyZhiyong
نویسندگان
چکیده
In this paper, we present a framework for synthesizing I/O eecient out-of-core programs for block recursive algorithms, such as the fast Fourier transform (FFT) and block matrix transposition algorithms. Our framework uses an algebraic representation which is based on tensor products and other matrix operations. The programs are optimized for the striped Vitter and Shriver's two-level memory model in which data can be distributed using various cyclic(B) distributions in contrast to the normally used physical track distribution cyclic(B d), where B d is the physical disk block size. We rst introduce tensor bases to capture the semantics of block-cyclic data distributions of out-of-core data and also data access patterns to out-of-core data. We then present program generation techniques for tensor products and matrix transposition. We accurately represent the number of parallel I/O operations required for the synthesized programs for tensor products and matrix transposition as a function of tensor bases and data distributions. We introduce an algorithm to determine the data distribution which optimizes the performance of the synthesized programs. Further, we formalize the procedure of synthesizing eecient out-of-core programs for tensor product formulas with various block-cyclic distributions as a dynamic programming problem. We demonstrate the eeectiveness of our approach through several examples. We show that the choice of an appropriate data distribution can reduce the number of passes to access out-of-core data by as large as eight times for a tensor product, and the dynamic programming approach can largely reduce the number of passes to access out-of-core data for the overall tensor product formulas.
منابع مشابه
SYNTHESIZING EFFICIENTOUT-OF-CORE PROGRAMS FOR BLOCK RECURSIVE ALGORITHMS USING BLOCK-CYCLIC DATA DISTRIBUTIONSy
This paper presents a framework for synthesizing I/O-efficient out-of-core programs for block recursive algorithms , such as the fast Fourier transform and matrix transpositions. The programs are synthesized from tensor (Kronecker) product representations of algorithms. These programs are optimized for a striped two-level memory model where in the out-of-core data can have block-cyclic distribu...
متن کاملSynthesizing Eecient Out-of-core Programs for Block Recursive Algorithms Using Block-cyclic Data Distributions
In this paper, we present a framework for synthesizing I/O eecient out-of-core programs for block recursive algorithms, such as the fast Fourier transform (FFT) and block matrix transposition algorithms. Our framework uses an algebraic representation which is based on tensor products and other matrix operations. The programs are optimized for the striped Vitter and Shriver's two-level memory mo...
متن کاملSynthesizing Efficient Out-of-Core Programs for Block Recursive Algorithms Using Block-Cyclic Data Distributions
ÐIn this paper, we present a framework for synthesizing I/O efficient out-of-core programs for block recursive algorithms, such as the fast Fourier transform (FFT) and block matrix transposition algorithms. Our framework uses an algebraic representation which is based on tensor products and other matrix operations. The programs are optimized for the striped Vitter and Shriver's twolevel memory ...
متن کاملGenerating Efficient Programs for Two-Level Memories from Tensor-products
This paper presents a framework for synthesizing eecient out-of-core programs for block recursive algorithms such as the fast Fourier transform (FFT) and Batcher's bitonic sort. The block recursive algorithms considered in this paper are described using tensor (Kronecker) product and other matrix operations. The algebraic properties of the matrix representation are used to derive eecient out-of...
متن کاملData Access Reorganizations in Compiling Out-of-Core Data Parallel Programs on Distributed Memory Machines
This paper describes optimization techniques for translating out-of-core programs written in a data parallel language like HPF to message passing node programs with explicit parallel I/O. We rst discuss how an out-of-core program can be translated by extending the method used for translating in-core programs. We demonstrate that straightforward extension of in-core compilation techniques does n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996